Audio-Visual Tracking of Concurrent Speakers
نویسندگان
چکیده
Audio-visual tracking of an unknown number concurrent speakers in 3D is a challenging task, especially when sound and video are collected with compact sensing platform. In this paper, we propose tracker that builds on generative discriminative audio-visual likelihood models formulated particle filtering framework. We localize multiple de-emphasized acoustic map assisted by the image detection-derived observations. The multi-modal observations either assigned to existing tracks for computation or used initialize new tracks. likelihoods rely color distribution target value. Experiments AV16.3 CAV3D datasets show proposed outperforms uni-modal trackers state-of-the-art approaches both plane.
منابع مشابه
Differences between Speakers in Audio-visual Classification of Word Prominence
We show how the audio-visual discrimination performance of prominent from non-prominent words based on an SVM classifier varies from speaker to speaker. We collected data in an experiment where users were interacting via speech in a small game, designed as a Wizard-of-Oz experiment, with a computer. Following misunderstandings of one single word of the system, users were instructed to correct t...
متن کاملHead Tracking of Auditory, Visual, and Audio-Visual Targets
The ability to actively follow a moving auditory target with our heads remains unexplored even though it is a common behavioral response. Previous studies of auditory motion perception have focused on the condition where the subjects are passive. The current study examined head tracking behavior to a moving auditory target along a horizontal 100° arc in the frontal hemisphere, with velocities r...
متن کاملAudio-Visual Person Tracking - A Practical Approach
In what case do you like reading so much? What about the type of the audio visual person tracking a practical approach book? The needs to read? Well, everybody has their own reason why should read some books. Mostly, it will relate to their necessity to get knowledge from the book and want to read just to get entertainment. Novels, story book, and other entertaining books become so popular this...
متن کاملJoint Audio-Visual Tracking Using Particle Filters
It is often advantageous to track objects in a scene using multimodal information when such information is available. We use audio as a complementary modality to video data, which, in comparison to vision, can provide faster localization over a wider field of view. We present a particle-filter based tracking framework for performing multimodal sensor fusion for tracking people in a videoconfere...
متن کاملPhysiologically motivated audio-visual localisation and tracking
An audio-visual localisation and tracking system for meeting scenarios is presented which draws its inspiration from neurobiological processing. Meetings are recorded by a KEMAR binaural manikin and a single camera placed directly above the manikin. Source localisation from the binaural audio and face, object and motion locations from the video frames are used as input to two linked neural osci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Multimedia
سال: 2022
ISSN: ['1520-9210', '1941-0077']
DOI: https://doi.org/10.1109/tmm.2021.3061800